05. Quiz: Sarsa
Quiz: Sarsa
Say that an agent is learning to navigate the gridworld described earlier in the lesson.
![Gridworld Example](img/environment.png)
Gridworld Example
Suppose the agent is using Sarsa in its search for the optimal policy, with \alpha=0.1.
At the end of the 99th episode, the Q-table has the following values:
![Q-table](img/qtable2.png)
Q-table
Say that at the beginning of the 100th episode, the agent starts in state 1 and selects action right. As a result, it receives reward -1, and the next state is state 2.
Then, at the next timestep, the agent selects action right.
![Beginning of the 100th episode](img/episode.png)
Beginning of the 100th episode
In the previous video, you learned that at this point in time, the agent updates the Q-table.